{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Learning with a single neuron using Flux.jl\n\nIn this notebook, we'll use `Flux` to create a single neuron and teach it to learn, as we did by hand in notebook 10!"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Read in data and process it\n\nLet's start by reading in our data"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "using CSV\nusing TextParse\nusing DataFrames\n\napplecols, applecolnames = TextParse.csvread(\"data/Apple_Golden_1.dat\", '\\t')\nbananacols, bananacolnames = TextParse.csvread(\"data/bananas.dat\", '\\t')\n\napples = DataFrame(Dict(strip(name)=>col for (name, col) in zip(applecolnames, applecols)))\nbananas = DataFrame(Dict(strip(name)=>col for (name, col) in zip(bananacolnames, bananacols)));"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "and processing it to extract information about the red and green coloring in our images:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "col1 = :red\ncol2 = :green\n\nx_apples  = [ [apples[i, col1], apples[i, col2]] for i in 1:size(apples)[1] ]\nx_bananas = [ [bananas[i, col1], bananas[i, col2]] for i in 1:size(bananas)[1] ]\n\nxs = vcat(x_apples, x_bananas)\n\nys = vcat( zeros(size(x_apples)[1]), ones(size(x_bananas)[1]) );"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "The input data is in `xs` and the labels (true classifications as bananas or apples) in `ys`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Using `Flux.jl`"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now we can load `Flux` to really get going!"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "using Flux"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "We saw in the last notebook that σ is a built-in function in `Flux`.\n\nAnother function that is used a lot in neural networks is called `ReLU`; in Julia, the function is called `relu`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 1\n\nUse the docs to discover what `ReLU` is all about.\n\n`relu.([-3, 3])` returns\n\nA) [-3, 3] <br>\nB) [0, 3] <br>\nC) [0, 0] <br>\nD) [3, 3] <br>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Making a single neuron in Flux"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's use `Flux` to build our neuron with 2 inputs and 1 output:\n\n <img src=\"data/single-neuron.png\" alt=\"Drawing\" style=\"width: 500px;\"/>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We previously put the two weights in a vector, $\\mathbf{w}$. Flux instead puts weights in a $1 \\times 2$ matrix (i.e. a matrix with 1 *row* and 2 *columns*). \n\nPreviously, to compute the dot product of $\\mathbf{w}$ and $\\mathbf{x}$ we had to use either the `dot` function, or we had to transpose the vector $\\mathbf{w}$:\n\n```julia\n# transpose w\nb = w' * x\n# or use dot!\nb = dot(w, x)\n```\nIf the weights are instead stored in a $1 \\times 2$ matrix, `W`, then we can simply multiply `W` and `x` together instead!\n\nWe start off with random values for our parameters and data:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "W = rand(1, 2)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "x = rand(2)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Note that the product of `W` and `x` will now be an array (vector) with a single element, rather than a single number:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "W * x"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "This means that our bias `b` is treated as an array when we're using `Flux`:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "b = rand(1)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 2\n\nWrite a function `mypredict` that will take a single input, array `x` and use `W`, `b`, and built-in `σ` to generate an output prediction (stored as an array). This function defines our neural network!\n\nHint: This function will look very similar to $f_{\\mathbf{w},\\mathbf{b}}$ from the last notebook but has changed since our data structures to store our parameters have changed!"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 3\n\nDefine a loss function called `loss`.\n\n`loss` should take two inputs: a vector storing data, `x`, and a vector storing the correct \"labels\" for that data. `loss` should return the sum of the squares of differences between the predictions and the correct labels."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Calculating gradients using Flux: backpropagation"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "For learning, we know that what we need is a way to calculate derivatives of the `loss` function with respect to the parameters `W` and `b`. So far, we have been doing that using finite differences. \n\n`Flux.jl` instead implements a numerical method called **backpropagation** that calculates gradients (essentially) exactly, in an automatic way, by indirectly applying the rules of calculus.\nTo do so, it provides a new type of object called \"tracked\" arrays. These are arrays that store not only their current value, but also information about gradients, which is used by the backpropagation method.\n\n[If you want to understand the maths behind backpropagation, we recommend e.g. [this lecture](https://www.youtube.com/watch?v=i94OvYb6noo).]"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "To do so, `Flux` provides a function `param` to define such objects that will contain the information for a *param*eter."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's start, as usual, by setting up some random initial values for the parameters:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "W_data = rand(1, 2)  \nb_data = rand(1)\n\nW_data, b_data"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now set up `Flux.jl` objects that will contain these values *and* their derivatives, and allow to propagate\nthis information around:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "W = param(W_data)\nb = param(b_data)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Here, `param` is a function that `Flux` provides to create an object that represents a parameter of a machine learning model, i.e. an object which has both a value and derivative information, and such that other objects know how to *keep track* of when it is used in an expression."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 4\n\nWhat type does `W` have?\n\nA) Array (1D) <br>\nB) Array (2D) <br>\nC) TrackedArray (1D) <br>\nD) TrackedArray (2D) <br>\nE) Parameter (1D) <br>\nF) Parameter (2D) <br>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 5\n\n`W` stores not only its current value, but also has space to store gradient information. You can access the values and gradient of the weights as follows:\n\n```julia\nW.data\nW.grad\n```\n\nAt this point, are the values of the weights or the gradient of the weights larger?\n\nA) the values of the weights <br>\nB) the gradient of the weights"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 6\n\nFor data `x` and `y` where\n\n```julia\nx, y = [0.413759, 0.692204], [0.845677]\n```\napply the loss function to `x` and `y` to give a new variable `l`. What is the type of `l`? (How many dimensions does it have?)\n\nA) Array (0D) <br>\nB) Array (1D) <br>\nC) TrackedArray (0D) <br>\nD) TrackedArray (1D)<br> \nE) Float64<br>\nF) Int64<br>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Stochastic gradient descent"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can now use these features to reimplement stochastic gradient descent, following the method we used in the previous notebook, but now using backpropagation!"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 7\n\nModify the code from the previous notebook for stochastic gradient descent to use Flux instead."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Investigating stochastic gradient descent"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's look at the values stored in `b` before we run stochastic gradient descent:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "b"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "After running `stochastic_gradient_descent`, we find the following:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "W_final, b_final = stochastic_gradient_descent(loss, W, b, xs, ys, 100000)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "we can look at the values of `W_final` and `b_final`, which our machine learned to generate our desired classification."
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "W_final"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "b_final"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 8\n\nPlot the data and the learned function."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 9\n\nDo this plot every so often as the learning process is proceeding in order to have an animation of the process."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Automation with Flux.jl"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We will need to repeat the above process for a lot of different systems.\nFortunately, `Flux.jl` provides us with tools to automate this!"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Flux allows to create a neuron in a simple way:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "using Flux"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "model = Dense(2, 1, σ)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "The `2` and `1` refer to the number of inputs and outputs, and the neuron is defined using the $\\sigma$ function."
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "typeof(model)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "We have made an object of type `Dense`, defined by `Flux`, with the name `model`. This represents a \"dense neural network layer\" (see later for more on neural network layers).\nThe parameters that will be modified during the learning process live *inside* the `model` object."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 10\n\nInvestigate which variables live inside the `model` object and what type they are. How does that compare to the call to create the `Dense` object that we started with?"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Model object as a function"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can apply the `model` object to data just as if it were a standard function:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "model(rand(2))"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 11\n\nProve to yourself that you understand what is going on when we call `model`. Create two arrays `W` and `b` with the same elements as `model.W` and `model.b`. Use `W` and `b` to generate the same answer that you get when we call `model([.5, .5])`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Using Flux"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now need to provide Flux with three pieces of information: \n\n1. A loss function\n2. Some training data\n3. An optimization method"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Loss functions"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Flux has various loss functions built in, for example the mean-squared error (`mse`) that we have been using:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "loss(x, y) = Flux.mse(model(x), y)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Another common one is the cross entropy, `Flux.crossentropy`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Data"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "The data can take a couple of different forms. \nOne form is a single **iterator**, consisting of pairs $(x, y)$ of data and labels.\nWe can achieve this with `zip`."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 12\n\nUse `zip` to \"zip together\" `xs` and `ys`. Then use the `collect` function to check what `zip` actually does."
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Optimization routine"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now we need to tell Flux what kind of optimization routine to use. It has several built in; the standard stochastic gradient descent algorithm that we have been using is called `SGD`. We must pass it two things: a list of parameter objects which will be modified by the optimization routine, and a step size:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "opt = SGD([model.W, model.b], 0.01)\n# give a list of the parameters that will be modified"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "The gradient calculations and parameter updates will be carried out by this optimizer function; we do not see those details, but if you are curious, you can, of course, look at the `Flux.jl` source code!"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Training"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now have all the pieces in place to actually **train** our model (a single neuron) on the data. \n\"Training\" refers to using pre-labeled data to learn the function that relates the input data to the desired output data given by the labels.\n\n`Flux` provides the function `train!`, which performs a single pass through the data and does a single step of optimization using the partial cost function for each data point:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "Flux.train!(loss, data, opt)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can then just repeat this several times to train the network more and coax it towards the minimum of the cost function:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "for i in 1:100\n    Flux.train!(loss, data, opt)\nend"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now let's look at the parameters after training:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "model.W"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "model.b"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "Instead of writing out a list of parameters to modify, `Flux` provides the function `params`, which extracts all available parameters from a model:"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "opt = SGD(params(model), 0.01)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        "params(model)"
      ],
      "metadata": {},
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Adding more features"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Exercise 13\n\nSo far we have just used two features, red and green. \n\n(i) Add a third feature, blue. Plot the new data.\n\n(ii) Train a neuron with 3 inputs and 1 output on the data.\n\n(iii) Can you find a good way to visualize the result?"
      ],
      "metadata": {}
    },
    {
      "outputs": [],
      "cell_type": "code",
      "source": [
        ""
      ],
      "metadata": {},
      "execution_count": null
    }
  ],
  "nbformat_minor": 2,
  "metadata": {
    "language_info": {
      "file_extension": ".jl",
      "mimetype": "application/julia",
      "name": "julia",
      "version": "0.6.2"
    },
    "kernelspec": {
      "name": "julia-0.6",
      "display_name": "Julia 0.6.2",
      "language": "julia"
    }
  },
  "nbformat": 4
}
